NISTIR XXXX Draft 


Ongoing Face Recognition 
Vendor Test (FRVT) 

Part 1: Verification 


Patrick Grother 
Mei Ngan 
Kayee Hanaoka 
Information Access Division 
Information Technology Laboratory 

This publication is available free of charge from: 
https: / / www.nist.gov/programs-projects / face-recognition-vendor-test-frvt-ongoing 


NIST 

National Institute of 
Standards and Technology 

U.S. Department of Commerce 



FRVT - FACE RECOGNITION VENDOR TEST - VERIFICATION 


1 


DISCLAIMER 

Specific hardware and software products identified in this report were used in order to perform the evalua¬ 
tions described in this document. In no case does identification of any commercial product, trade name, or 
vendor, imply recommendation or endorsement by the National Institute of Standards and Technology, nor 
does it imply that the products and equipment identified are necessarily the best available for the purpose. 


ABOUT THIS REPORT 


This report is a draft NIST Interagency Report, and is open for comment. It documents the verification-track 
of the ongoing Face Recognition Vendor Test. The report will be updated continuously as new algorithms are 
evaluated, as new datasets are added, and as new analyses are included. Comments and suggestions should 
be directed to frvt@nist.gov. 


FNMR(T) "False non-match rate' 
FMR(T) "False match rate" 
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1 Changelog 

1.1 July 29 2017 

> Added 8 new algorithms 

> Added results for a child-exploitation dataset 

> Added Table 2 a standalone tabulation of false non-match rates 

> We have received additional CPU algorithms - Results should appear August 4,2017 

> We have received additional GPU algorithms - Results to appear as computational resources are released from the 
Face Recognition Prize Challenge 

1.2 June 19 2017 

> Added five new algorithms, three of which remain in-process 

> Added results for a "wild" dataset of images similar to non-cooperative photojournalism images 

> Added Table 3 a standalone tabulation of failure to enrol rates 

> Added Fig. 1 showing tradeoff between FNMR, template size, template generation time, and match duration. 

> Added Fig. 13 showing how FMR is concentrated in certain images. 

> Restated cross-region false match rates at nominal FMR = 0.0001 instead of 0.001 

> Improved DET legends. 
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Developer 

Short 

Seq. 

Validation 

Config 1 

Template 

GPU 

Comparison Time (ns) 3 

Name 

Name 

Num. 

Date 

Data (KB) 

Size (B) 

Time (ms) 2 


Genuine 

Impostor 

3DiVi 

3divi 

000 

2017-03-16 

169360 

°512 ± 0 

4 4 285 ± 52 

Yes 

4 378 ± 20 

2 375 ± 19 

Ayonix 

ayonix 

000 

2017-06-22 

58505 

b 1036 ± 0 

4 18±2 

No 

2 621 ± 23 

3 620 ± 26 

Dermalog 

dermalog 

001 

2017-02-22 

0 

' 1043 ± 0 

“106 ± 1 

No 

42 22160 ± 166 

lb 22131 ± 131 

Dermalog 

dermalog 

002 

2017-02-22 

0 

“ 1043 ± 0 

3 81 ± 0 

No 

43 22169 ± 114 

43 22105 ± 146 

Dermalog 

dermalog 

003 

2017-07-10 

0 

8 1043±0 

iu 121 ± 22 

No 

44 22957 ± 93 

4 7 22808 ± 131 

Digital Barriers 

barriers 

000 

2017-05-31 

157794 

43 2056 ± 0 

b 104 ± 0 

No 

3 13232 ± 166 

4 4 13226 ± 146 

Digital Barriers 

barriers 

001 

2017-07-20 

236934 

i4 2056 ± 0 

43 294 ± 1 

No 

7 12311 ± 164 

“12347 ± 197 

Is It You 

isityou 

000 

2017-06-26 

48010 

24 19200 ± 0 

44 113 ± 5 

No 

23 237517 ± 1318 

2b 237374 ± 1279 

ITMO University 

itmo 

001 

2017-06-12 

751338 

26 37997± 0 

23 870 ± 4 

No 

43 29119 ± 1420 

4S 27817 ± 340 

Morpho 

morpho 

000 

2017-07-11 

100806 

4 116 ± 0 

“109 ± 1 

Yes 

4 993 ± 31 

3 1000 ± 34 

Neurotechnology 

neurotech 

000 

2017-03-22 

62129 

23 7148 ± 0 

2U 611 ± 48 

No 

23 74288 ± 2194 

2 3 72879 ± 2640 

N-Tech Lab 

ntech 

000 

2017-03-13 

191530 

1 '2906 ± 1 

43 278 ± 13 

No 

4 7 30787 ± 142 

4 “30846 ± 77 

N-Tech Lab 

ntech 

001 

2017-05-10 

691296 

22 6744 ± 1 

4a 587 ± 11 

No 

24 67692 ± 833 

2 2 67486 ± 244 

Rank One Computing 

rankone 

000 

2017-03-21 

0 

2 144 ± 0 

4 82±9 

No 

4 a 39932 ± 468 

8 8722 ± 171 

Rank One Computing 

rankone 

001 

2017-04-12 

0 

4 208 ± 0 

2 56 ±4 

No 

2 2 72754 ± 658 

4 345 ± 29 

Samtech InfoNet Limited 

samtech 

000 

2017-05-02 

109774 

12 2056 ± 0 

4 4 262 ± 2 

No 

“4550 ± 26 

' 4541 ± 28 

TongYi Transportation Technology 

tongyi 

001 

2017-04-01 

625339 

4 “2058 ± 0 

ib 310 ± 20 

No 

40 17769 ± 74 

43 17750 ± 63 

TongYi Transportation Technology 

tongyi 

002 

2017-07-15 

625336 

13 2058 ± 0 

4 7 356 ± 35 

No 

lb 29816 ± 281 

44 17799 ± 127 

Tupel 

tupel 

001 

2017-05-05 

6347 

2U 4142 ± 0 

42 273 ± 0 

No 

2 “56353 ± 618 

2 4 59204 ± 563 

VCognition 

vcog 

001 

2017-03-28 

86103 

13 4126 ± 0 

'108 ± 17 

Yes 

a 16320 ± 197 

4 2 16426 ± 425 

VCognition 

vcog 

002 

2017-06-12 

3229434 

27 61504 ± 5 

48 357 ± 25 

No 

2b 296154 ± 3077 

2 3 296436 ± 4183 

Vigilant Solutions 

vigilant 

000 

2017-03-30 

352218 

2b 31540± 0 

2 4 884 ± 23 

No 

4 4 18201 ± 94 

4 “13030 ± 83 

Vigilant Solutions 

vigilant 

001 

2017-06-13 

344685 

41 1544 ± 0 

2b 921 ± 2 

No 

3 644 ± 13 

4 649 ± 16 

VisionLabs 

visionlabs 

001 

2017-06-12 

343661 

3 204 ± 0 

2 ' 943 ± 8 

No 

“1395 ± 45 

“1148 ± 53 

Vocord 

vocord 

001 

2017-04-21 

616989 

21 6194±0 

2 3 908 ± 16 

No 

2 ‘ 1094730 ± 64282 

27 1107193 ± 66523 

Vocord 

vocord 

002 

2017-06-07 

918292 

4U 1330±0 

2 2 782 ± 36 

Yes 

2 4 83063 ± 517 

24 83072 ± 714 

Shanghai Yitu Technology 

yitu 

000 

2017-05-23 

2211068 

1U 4130 ± 0 

21 672 ± 2 

No 

18 35352 ± 114 

2U 37848 ± 1773 


Notes 

1 

The size of configuration data does not capture static data included in the libraries. We do not include the size of the libraries because 
some algorithms include common ancilliary libraries for image processing (e.g. openCV) or numerical computation (e.g. bias). 

2 

The median template creation times are measured on Intel® Xeon(R)CPU E5-2630 v4 @ 2.20GHz processors or, in the case of GPU-enabled 
implementations, NVidia Tesla K40. 

3 

The median comparison durations, in nanoseconds, are estimated using std::chrono::high_resolution_clock which on the machine in (2) 
counts clock ticks of duration Ins. Precision is somewhat worse that that however. The ± value is the median absolute deviation times 
1.48 for Normal consistency. 




Table 1: Summary of algorithms and properties included in this report. The red superscripts give ranking for the quantity in that column. 
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Algorithm 

FALSE NON-MATCH RATE (FNMR) 

Name 

CHILD EXP 

MUGSHOT 

SELFIE 

VISA 

VISA 

WEBCAM 

WILD 

FMR 

0.01 


0.0001 

0.0001 

lE-06 

0.0001 

0.0001 

0.0001 

3divi-000/2017-03-16 

0.553 

3 

0.037 


0.055 

8 

0.133 

7 

0.029 

6 

0.002 

3 

0.547 

4 

ayonix-000 /2017-06-22 

0.843 


0.309 


0.360 


0.487 


0.230 


0.172 


0.807 


dermalog-001 /2017-02-22 

0.985 


0.237 


0.193 


0.305 


0.113 


0.112 


0.704 


dermalog-002/2017-02-22 

0.985 


0.241 


0.179 


0.315 


0.122 


0.109 


0.709 


dermalog-003/2017-07-10 

0.845 


0.202 


0.115 


0.280 


0.112 14 

0.041 16 

0.693 


digitalbarriers-000/2017-05-31 

0.771 

14 

0.184 

17 

0.170 

17 

0.463 

20 

0.161 

18 

0.045 

17 

0.741 

15 

digitalbarriers-001 /2017-07-20 

- 

27 

0.041 

9 

0.115 

15 

0.502 

22 

0.155 

17 

0.029 

14 

0.678 

/ 

isityou-000/2017-06-26 

- 

25 

0.680 

24 

- 

25 

0.703 

24 

0.414 

24 

0.690 

24 

1.000 

25 

itmo-001/2017-06-12 

0.797 


- 


- 


0.441 


0.171 


- 


- 


morpho-000/2017-07-11 

0.846 


0.028 


0.012 


0.134 


0.026 


0.007 


0.893 


neurotechnology-000/2017-03-22 

0.845 


0.062 


0.052 


0.237 

11 

0.052 


0.005 


0.943 


ntechiab-000/2017-03-13 

0.533 


0.044 


0.014 


0.086 6 

0.027 


0.003 


0.367 


ntechlab-001/2017-05-10 

0.472 1 

0.030 

0.014 


0.083 

0.025 3 

0.003 


0.319 


rankone-000/2017-03-21 

0.787 

15 

0.177 

16 

0.092 

14 

0.276 

12 

0.071 

12 

0.021 

11 

0.723 

1.2 

rankone-001 /2017-04-12 

0.858 

21 

0.091 

14 

0.176 

18 

0.420 

17 

0.171 

21 

0.024 

13 

0.842 

18 

samtech-000 /2017-05-02 

0.765 


0.044 


0.063 


0.443 


0.161 


0.021 


0.878 


tongyitrans-001 /2017-04-01 

0.743 


0.041 


0.063 


0.072 


0.038 


0.009 


0.704 


tongyitrans-002/2017-07-15 

0.746 


0.039 


0.063 


0.066 


0.030 


0.010 


0.725 


tupel-001 / 2017-05-05 

* 

26 

0.641 


- 


1.000 


0.680 


- 




vcog-001/2017-03-28 

0.686 6 

- 

26 

0.427 

22 

0.892 

25 

0.409 

23 

0.302 

21 

- 

24 

vcog-002/2017-06-12 

0.752 

11 

0.692 

25 

0.666 

24 

0.903 

26 

0.504 

26 

0.559 

23 

0.778 

16 

vigilantsolutions-000/2017-03-30 

0.894 

22 

0.595 

22 

0.643 

23 

0.688 

23 

0.415 

25 

0.401 

22 

0.915 

21 

vigilantsolutions-001 /2017-06-13 

0.730 


0.101 


0.061 


0.348 


0.105 


0.016 


0.729 


visionlabs-001 /2017-06-12 

0.561 


0.024 


0.014 


0.180 


0.030 


0.001 


0.591 


vocord-001/2017-04-21 

0.695 


0.063 


0.069 


0.141 


0.035 


0.036 


0.654 


vocord-002/2017-06-07 

0.762 


0.019 


0.012 1 

0.034 


0.013 

- 

25 

0.948 


yitu-000/2017-05-23 

0.586 5 

0.017 1 

0.012 

0.033 1 

0.021 

0.000 1 

0.431 


Table 2: FNMR is the proportion of mated comparisons below a threshold. 
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2 Metrics 

2.1 Core accuracy 

Given a vector of N genuine scores, u, the false non-match rate (FNMR) is computed as the proportion below some 
threshold, T: 

1 N 

FNMR(T) = 1 - - £ H(ui - T) (1) 

i 

where H{x) is the unit step function, and II ( 0) taken to be 1. 

Similarly, given a vector of N impostor scores, v, the false match rate (FMR) is computed as the proportion above T: 

1 N 

FMR (T) = -'£ i H (v i - T ) (2) 

i 

The threshold, T, can take on any value. We typically generate a set of thresholds from quantiles of the observed impostor 
scores, v, as follows. Given some interesting false match rate range, [FMR/. FMR//, we form a vector of K thresholds 
corresponding to FMR measurements evenly spaced on a logarithmic scale 

Tk = Q V (1- FMR/,) (3) 

where Q is the quantile function, and FMR/, comes from 

logic FMR fc = log 10 FMR l + ^ [log 10 FMR// - log 10 FMR L ] (4) 

Error tradeoff characteristics are plots of FNMR(T) vs. FMR(T). These are plotted with FMR/- —> 1 and FMRl as low 
as is sustained by the number of impostor comparisons, N. This is somewhat higher than the "rule of three" limit 3 /N 
because samples are not independent, due to re-use of images. 


FNMR(T) "False non-match rate' 
FMR(T) "False match rate" 
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3 Datasets 

3.1 Child exploitation images 

> The number of images is 0(10 4 ). 

> The number of subjects is O(10 3 ). 

> The number of subjects with two images O(10 3 ). 

> The images are operational. They are taken from ongoing investigations of child exploitation crimes. The images 
are arbitrarily unconstrained. Pose varies considerably around all three axes, including subject lying down. Res¬ 
olution varies very widely. Faces can be occluded by other objects, including hair and hands. Lighting varies, 
although the images are intended for human viewing. Mis-focus is rare. Images are given to the algorithm without 
any cropping; faces may occupy widely varying areas. 

> The images are usually large from contemporary cameras. The mean interocular distance (IOD) is 70 pixels. 

> The images are of subjects from several countries, with significant imbalance due to visa issuance patterns. 

> The images are of subjects of children, from infancy to late adolescence. 

> All of the images are live capture, none are scanned. Many have been cropped. 

> When these images are input to the algorithm, they are labelled as being of type "EXPLOITATION" - see Table 4 of 
the FRVT API. 

3.2 Visa images 

> The number of images is 0(10 5 ). 

> The number of subjects is O(10 5 ). 

> The number of subjects with two images O(10 4 ). 

> The images have geometry in reasonable conformance with the ISO/IEC 19794-5 Full Frontal image type. Pose is 
generally excellent. 

> The images are of size 252x300 pixels. The mean interocular distance (IOD) is 69 pixels. 

> The images are of subjects from greater than 100 countries, with significant imbalance due to visa issuance patterns. 

> The images are of subjects of all ages, including children, again with imbalance due to visa issuance demand. 

> Many of the images are live capture. A substantial number of the images are photographs of paper photographs. 

> When these images are input to the algorithm, they are labelled as being of type "ISO" - see Table 4 of the FRVT 
API. 

3.3 Mugshot images 

> The number of images is O(10 6 ). 

> The number of subjects is O(10 5 ). 

> The number of subjects with two images O(10 5 ). 
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> The images have geometry in reasonable conformance with the ISO/IEC 19794-5 Full Frontal image type. 

> The images are of variable sizes. The median IOD is 104 pixels. The mean IOD is 123 pixels. 

> The images are of subjects from the United States. 

> The images are of adults. 

> The images are all live capture. 

> When these images are input to the algorithm, they are labelled as being of type "mugshot" - see Table 4 of the 
FRVT API. 

3.4 Selfie images 

> The number of images is below 500. 

> The number of subjects is below 500. 

> All subjects have a selfie image, and a portrait image. 

> The portrait images are in reasonable conformance with the ISO/IEC 19794-5 Full Frontal image type. 

> The selfie images vary: taken with camera above and below eye level, with one hand or two hands. Pitch angles 
vary more than yaw angles, which are frontal. Some perspective distortion is evident. 

> The images have mean IOD of 140 pixels. 

> The images are of subjects from the United States. 

> The images are of adults. 

> The images are all live capture. 

> When these images are input to the algorithm, they are labelled as being of type "wild" - see Table 4 of the FRVT 
API. 

3.5 Webcam images 

> The number of images is below 1500. 

> The number of subjects is below 1500. 

> All subjects have a webcam image, and a portrait image. 

> The portrait images are in reasonable conformance with the ISO/IEC 19794-5 Full Frontal image type. 

> The webcam images are taken with camera at a typical head height, with mild pitch angles, low yaw angles, but 
some variation in range, such that low perspective distortion is sometimes evident. 

> The images have mean IOD of 68 pixels (sd=12). 

> The images are of subjects from the United States. 

> The images are of adults. 

> The images are all live capture. 

> When these images are input to the algorithm, they are labelled as being of type "wild" - see Table 4 of the FRVT 
API. 
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3.6 Wild images 

> The number of images is O(10 5 ). 

> The number of subjects is O(10 3 ). 

> The number of subjects with two images O(10 3 ). 

> The images include many photojournalism-style images. Images are given to the algorithm using a variable but 
generally tight crop of the head. Resolution varies very widely. The images are arbitrarily unconstrained. Pose 
varies considerably around all yaw and pitch axes. Faces can be occluded, including hair and hands. 

> The images are of adults. 

> All of the images are live capture, none are scanned. 

> When these images are input to the algorithm, they are labelled as being of type "WILD" - see Table 4 of the FRVT 
API. 

4 Results 

4.1 Test goals 

> To state overall accuracy. 

> To compare algorithms. 

4.2 Test design 

Method: For visa images: 

> The comparisons are of visa photos against visa photos. 

> The number of genuine comparisons is O(10 4 ). 

> The number of impostor comparisons is O(10 10 ). 

> The comparisons are fully zero-effort, meaning impostors are paired without attention to sex, age or other covari¬ 
ates. However, later analysis is conducted on subsets. 

> The number of persons is O(10 5 ). 

> The number of images used to make 1 template is 1. 

> The number of templates used to make each comparison score is two corresponding to simple one-to-one verifica¬ 
tion. 

For mugshot images: 

> The comparisons are of mugshot photos against mugshot photos. 

> The number of genuine comparisons is O(10 5 ). 

> The number of impostor comparisons is 0(10'). 
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> The comparisons are fully zero-effort, meaning impostors are paired without attention to sex, age or other covari¬ 
ates. 

> The number of persons is 0(10 6 ). 

> The number of images used to make 1 template is 1. 

> The number of templates used to make each comparison score is two corresponding to simple one-to-one verifica¬ 
tion. 

For selfie images: 

> The comparisons are of selfie photos against portrait photos. 

> The number of genuine comparisons is O(10 2 ). 

> The number of impostor comparisons is O(10 8 ) selfies are compared with portraits of O(10 6 ) other subjects. 

> The comparisons are fully zero-effort, meaning impostors are paired without attention to sex, age or other covari¬ 
ates. 

> The number of persons is O(10 6 ). 

> The number of images used to make 1 template is 1. 

> The number of templates used to make each comparison score is two corresponding to simple one-to-one verifica¬ 
tion. 

For webcam images: 

> The comparisons are of webcam photos against portrait photos. 

> The number of genuine comparisons is 0(10 3 ). 

> The number of impostor comparisons is O(10 9 ) webcams are compared with portraits of O(10 6 ) other subjects. 

> The comparisons are fully zero-effort, meaning impostors are paired without attention to sex, age or other covari¬ 
ates. 

> The number of persons is 0(10 6 ). 

> The number of images used to make 1 template is 1. 

> The number of templates used to make each comparison score is two corresponding to simple one-to-one verifica¬ 
tion. 

For child exploitation images: 

> The comparisons are of unconstrained child exploitation photos against others of the same type. 

> The number of genuine comparisons is 0(10 4 ). 

> The number of impostor comparisons is 0(10'). 

> The comparisons are fully zero-effort, meaning impostors are paired without attention to sex, age or other covari¬ 
ates. 

> The number of persons is 0(10 3 ). 

> The number of images used to make 1 template is 1. 


FNMR(T) "False non-match rate' 
FMR(T) "False match rate" 


2017/07/31 12:20:11 





FRVT - FACE RECOGNITION VENDOR TEST - VERIFICATION 


14 


> The number of templates used to make each comparison score is two corresponding to simple one-to-one verifica¬ 
tion. 

> We produce two performance statements. First, is a DET as used for visa and mugshot images. The second is 
a cumulative match characteristic (CMC) summarizing a simulated one-to-many search process. This is done as 
follows. 

- We regard M enrollment templates as items in a gallery. 

- These M templates come from M > N individuals, because multiple images of a subject are present in the 
gallery under separate identifiers. 

- We regard the verification templates as search templates. 

- For each search we compute the rank of the highest scoring mate. 

- This process should properly be conducted with a 1:N algorithm, such as those tested in NIST IR 8009. We 
use the 1:1 algorithms in a simulated 1:N mode here to a) better reflect what a child exploitation analyst does, 
and b) to do show algorithm efficacy is better than that revealed in the verification DETs. 


4.3 Failure to enrol 


Algorithm 

Failure to Enrol Rate 1 j 

Name 

CHILD- 

■EXPLOIT 

MUGSHOT 

SELFIES 

VISA 

WEBCAM 

WILD | 

3divi-000 

0.2019 

14 

0.0019 


0.0202 


0.0008 


0.0020 


0.2070 

1/ 

ayonix-000 

0.0000 


0.0109 


0.0751 


0.0137 


0.0109 


0.0000 


dermalog-001 

0.9109 

29 

0.0045 

21 

0.0954 

25 

0.0013 

12 

0.0471 

25 

0.3979 

21 

dermalog-002 

0.9109 

30 

0.0045 

22 

0.0954 

26 

0.0013 

13 

0.0471 

26 

0.3979 

22 

dermalog-003 

0.0434 


0.0007 


0.0000 


0.0025 


0.0007 


0.0701 


digitalbarriers-000 

0.5469 


0.0043 


0.0925 


0.0019 


0.0184 


0.5170 


isityou-000 

0.4714 


0.0022 


0.0665 


0.0010 


0.0116 


0.4586 


itmo-001 

0.5751 


0.0103 


- 


0.0047 


- 


0.7752 


morpho-000 

0.0000 


0.0000 1 

0.0000 


0.0000 


0.0000 


0.0000 


neurotechnology-000 

0.0000 

4 

0.0000 

2 

0.0000 

5 

0.0000 


0.0000 5 

0.0163 

7 

ntechlab-000 

0.2496 

17 

0.0015 

13 

0.0058 

12 

0.0016 

16 

0.0007 

9 

0.1099 

12 

ntechlab-001 

0.0926 

12 

0.0009 

7 

0.0029 

7 

0.0005 

7 

0.0007 

6 

0.0584 

9 

rankone-000 

0.0187 


0.0005 


0.0000 


0.0003 


0.0007 


0.2349 


rankone-001 

0.0012 


0.0001 


0.0000 


0.0000 


0.0000 


0.0858 


samtech-000 

0.5474 


0.0052 


0.0491 


0.0042 


0.0252 


0.7023 


tongyitrans-001 

0.0000 


0.0068 


0.0462 


0.0040 


0.0055 


0.0000 


tongyitrans-002 

0.3609 


0.0078 


1 0.0462 16 1 

0.0040 


0.0055 15 

0.0000 1 

tupel-001 

- 

26 

0.0030 

18 

0.0000 

2 

0.0018 

19 

0.0000 

2 

- 

26 

vcog-001 

0.1579 

13 

- 

26 

0.0058 

11 

0.0018 

17 

0.0000 

1 

- 

26 

vcog-002 

0.2209 


0.0021 


0.0087 


0.0019 


0.0007 


0.1672 


vigilantsolutions-000 

0.5580 


0.0018 


0.0462 


0.0007 


0.0109 


0.5927 


vigilantsolutions-001 

0.3585 


0.0010 


0.0116 


0.0004 


0.0048 


0.3262 


visionlabs-001 

0.2699 


0.0014 


0.0058 


0.0014 


0.0020 


0.1803 


vocord-001 

0.4732 

24 

0.0158 

28 

0.0520 


0.0038 

24 

0.0348 

24 

0.4494 

2 l . 

vocord-002 

0.3782 

22 

0.0015 

11 

0.0029 

9 

0.0037 

23 

0.0171 

20 

0.1992 

16 

yitu-000 

0.3475 

19 

0.0015 

12 

0.0029 

8 

0.0013 

14 

0.0014 

11 

0.1591 

13 


Table 3: FTE is the proportion of failed template generation attempts. Failures can occur because the software throws an exception, 
or because the software electively refuses to process the input image. This would typically occur if a face is not detected. FTE is 
measured as the number of function calls that give a non-zero error code, OR that give a "small" template. This is defined as one 
whose size is less than 0.3 times the median template size. This second rule is needed because some algorithms incorrectly fail to 
return a non-zero error code when template generation fails. 

The effects of FTE are included in the accuracy results later in this report by regarding any template comparison that involves an failed 
template is taken to produce a low similarity score. 


4.4 Recognition accuracy 


Core algorithm accuracy is stated via: 
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> The summary table of Figure 2; 

> The visa image DETs of Figure 2; 

> The mugshot DETs of Figure 3 ; 

> The selfie-portrait DETs of Figure 4; 

> The webcam-portrait DETs of Figure 5; 

> The child-exploitation DET of Figure 7; 

> The child-exploitation CMC of Figure 8. 

Figure 12 shows dependence of false match rate on algorithm score threshold. This allows a deployer to set a threshold 
to target a particular false match rate appropriate to the security objectives of the application. 

Figure 14 likewise shows FMR(T) but for mugshots, and specially four subsets of the population. 

Note that in both the mugshot and visa sets false match rates vary with the ethnicity, age, and sex, of the enrollee and 
impostor - see section 4.6. 
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Figure 2: For the visa images, detection error tradeoff (det) characteristics showing false non-match rate vs. false match rate plotted parametrically on threshold, T. The 
scales are logarithmic in order to show many decades of FMR. 
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FNMR(T) "False non-match rate' 
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Figure 3: For the mugshot images, detection error tradeoff (DET) characteristics showing false non-match rate vs. false match rate plotted parametrically on threshold, T. 
The scales are logarithmic in order to show decades of FMR. 
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4.5 Genuine distribution stability 

4.5.1 Effect of birth place on the genuine distribution 

Background: Both skin tone and bone structure vary geographically. Prior studies have reported variations in FNMR 
and FMR. 

Goal: To measure false non-match rate (FNMR) variation with country of birth. 

Methods: Thresholds are determined that give FMR = {0.001,0.0001} over the entire impostor set. Then FNMR is mea¬ 
sured over 1000 bootstrap replications of the genuine scores. Only those countries with at least 140 individuals are 
included in the analysis. 

Results: Figure 15 shows FNMR by country of birth for the two thresholds. 

Caveats: The results may not relate to subject-specific properties. Instead they could reflect image-specific quality dif¬ 
ferences, which could occur due to collection protocol or software processing variations. 
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4.5.2 Effect of age on genuine subjects 

Background: Faces change appearance throughout life. Face recognition algorithms have previously beeen reported to 
give better accuracy on older individuals (See NIST IR 8009). 

Goal: To quantify false non-match rates (FNMR) as a function of age. We do not aim to quantify ageing effects here as the 
separation between two samples is limited to just a few years. 

Methods: Using the visa images, thresholds are determined that give FMR = 0.001 and 0.0001 over the entire impostor 
set. Then FNMR is measured over 1000 bootstrap replications of the genuine scores. Only those countries with at least 30 
individuals are included in the analysis. 

Results: For the visa images. Figure 16 shows how false non-match rates for genuine users, as a function of age group. 
The notable aspects are: 

> Younger subjects give considerably higher FNMR. This is likely due rapid growth and change in facial appearance. 

> FNMR trends down throughout life. The last bin, AGE > 72, contains fewer than 140 mated pairs, and may be 
affected by small sample size. 

Caveats: None. 
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4.6 Impostor distribution stability 

4.6.1 Effect of birth place on the impostor distribution 

Background: Facial appearance varies geographically, both in terms of skin tone, cranio-facial structure and size. This 
section addresses whether false match rates vary intra- and inter-regionally. 

Goals: 

> To show the effect of birth region of the impostor and enrollee on false match rates. 

> To determine whether some algorithms give better impostor distribution stability. 

Methods: 


> For the visa images, NIST defined 10 regions: Sub-Saharan Africa, South Asia, Polynesia, North Africa, Middle 
East, Europe, East Asia, Central and South America, Central Asia, and the Caribbean. 

> For the visa images, NIST mapped each country of birth to a region. There is some arbitrariness to this. For exam¬ 
ple, Egypt could reasonably be assigned to the Middle East instead of North Africa. An alternative methodology 
could, for example, assign the Philippines to both Polynesia and East Asia. 

> FMR is computed for cases where all face images of impostors born in region r 2 are compared with enrolled face 
images of persons born in region ry . 

FMR(r:, r 2 , T) = A g( * ~ T) (5) 

™ri ,r2 

where the same threshold, T, is used in all cells, and H is the unit step function. The threshold is set to give FMR(T) 
= 0.001 over the entire set of visa image impostor comparisons. 

> This analysis is then repeated by country-pair, but only for those country pairs where both have at least 1000 images 
available. The countries 1 appear in the axes of graphs that follow. 

> The mean number of impostor scores in any cross-region bin is 33 million. The smallest number of impostor scores 
in any bin is 135000, for Central Asia - North Africa. While these counts are large enough to support reasonable 
significance, the number of individual faces is much smaller, O(N 0 5 ). 

[> The numbers of impostor scores in any cross-country bin is shown in Figure 71. 

Results: Subsequent figures show heatmaps that use color to represent the base-10 logarithm of the false match rate. 

Red colors indicate high (bad) false match rates. Dark colors indicate benign false match rates. There are two series of 

graphs corresponding to aggregated geographical regions, and to countries. The notable observations are: 

> The on-diagonal elements correspond to within-region impostors. FMR is generally above the nominal value of 
FMR = 0.001. Particularly there is usually higher FMR in, Sub-Saharan Africa, South Asia, and the Caribbean. 
Europe and Central Asia, on the other hand, usually give FMR closer to the nominal value. 

> The off-diagonal elements correspond to across-region impostors. The highest FMR is produced between the 
Caribbean and Sub-Saharan Africa. 

> Algorithms vary. 

'These are Argentina, Australia, Brazil, Chile, China, Costa Rica, Cuba, Czech Republic, Dominican Republic, Ecuador, Egypt, El Salvador, Germany, 
Ghana, Great Britain, Greece, Guatamala, Haiti, Hong Kong, Honduras, Indonesia, India, Israel, Jamaica, Japan, Kenya, Korea, Lebanon, Mexico, 
Malaysia, Nepal, Nigeria, Peru, Philippines, Pakistan, Poland, Romania, Russia, South Africa, Saudi Arabia, Thailand, Trinidad, Turkey, Taiwan, 
Ukraine, Venezuela, and Vietnam. 
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> We computed the same quantities for a global FMR = 0.0001. The effects are similar. 

Caveats: 

> The effects of variable impostor rates on one-to-many identification systems may well differ from what's implied 
by these one-to-one verification results. Two reasons for this are a) the enrollment galleries are usually imbalanced 
across countries of birth, age and sex; b) one-to-many identification algorithms often implement techniques aimed 
at stabilizing the impostor distribution. Further research is necessary. 

> In principle, the effects seen in this subsection could be due to differences in the image capture process. We consider 
this unlikely since the effects are maintained across geography - e.g. Caribbean vs. Africa, or Japan vs. China. 


FNMR(T) "False non-match rate' 
FMR(T) "False match rate" 
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Cross region FMR at threshold T = 3.057 for algorithm 3divi_000, giving FMR(T) = 0.0001 globally. 
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matrix is not quite symmetric because images in the enrollment and verification sets are different. 















Cross region FMR at threshold T = 0.919 for algorithm ayonix OOO, giving FMR(T) = 0.0001 globally. 
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in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different. 

















Cross region FMR at threshold T = 81.064 for algorithm dermalog_001, giving FMR(T) = 0.0001 globally. 
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in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different. 













Cross region FMR at threshold T = 81.164 for algorithm dermalog_002, giving FMR(T) = 0.0001 globally. 
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in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different. 











Cross region FMR at threshold T = 84.718 for algorithm dermalog_003, giving FMR(T) = 0.0001 globally. 
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in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different. 
















Cross region FMR at threshold T = 0.646 for algorithm digitalbarriers_000, giving FMR(T) = 0.0001 globally. 
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in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different. 





















Cross region FMR at threshold T = 0.700 for algorithm digitalbarriers_001, giving FMR(T) = 0.0001 globally. 
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in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different. 


































Cross region FMR at threshold T = 23.498 for algorithm isityou_000, giving FMR(T) = 0.0001 globally. 
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in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different. 





















Cross region FMR at threshold T = 990.194 for algorithm itmo_001, giving FMR(T) = 0.0001 globally. 
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matrix is not quite symmetric because images in the enrollment and verification sets are different. 



















Cross region FMR at threshold T = 3846.708 for algorithm morpho_000, giving FMR(T) = 0.0001 globally. 


FRVT - FACE RECOGNITION VENDOR TEST - VERIFICATION 


44 



2017/07/31 12:20:11 


FNMR(T) "False non-match rate' 
FMR(T) "False match rate" 


in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different. 

















Cross region FMR at threshold T = 43.010 for algorithm neurotechnology_000, giving FMR(T) = 0.0001 globally. 
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increase in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different. 






















Cross region FMR at threshold T = 0.105 for algorithm ntechlab_000, giving FMR(T) = 0.0001 globally. 


FRVT - FACE RECOGNITION VENDOR TEST - VERIFICATION 


46 



< 

< 

>3 

< 

1— 

o 

< 

O 

< 

CD 

< 

< 

>3 

< 

1— 

O 

< 

O 

< 

CD 

o 

CO 

—l 

o 

O 

CO 

<£ 

cc 

CO 

CC 

C/) 

CC 

o 

CO 

_1 

o 

o 

CO 

DC 

CO 

DC 

CO 

DC 

cc 

< 

CL 

CC 

LU 


< 

LU 

< 

< 

DC 

< 

Q_ 

cc 

LU 

—3 

< 

LU 

< 

< 

LL 

< 

CO 

LL 

< 

111 

LU 

< 

o 

o 

LL 

< 

CO 


LL 

< 

LU 

LU 

< 

o 

O 

c n 
c/.) 







w 

o 



c n 

CO 







CO 

o 




I 

I 

■a 


I 

§ 




■a u 

•g § 

Qj U 
•«-* C 

•S 

S g 

■s ^ 

o 

U 


v +h 

o 

I 

.CD 

§ 

03 

I 

CJ 


P< 4* 

03 

O 


CD 

o _ 

Ph 03 

£ ^ 

•p o 

"5b 
ps 

QJ 
X 

o 

■s 

§ 


■S “3 

03 T 3 

.y s 

T3 R 

•5 W* 

fcs >H 

Qj R 

O 

u ° 

§ § 
p ® 

O o 

*3 O 

£\g 

■p aj 


Qj CD 
"P “ 

8 


a 

03 

S 


I 

o 

-p 


03 

§ 

■*- C 

03 -P 

03 ^ 
X U 

I ^ 

3 03 

P" bJD 

Qj •*-. 

s s 

% & 
a* 2 


03 

i* cp 

Qj 03 




03 

QJ 

•P 

QJ 

•s 


> 

5b 

-Ui 

8 

-P 

-5 


,QJ 

S’ 

s 


CD P 
Qj U 

F 5 


QJ 

03 ^ 
•SQ r c^ 

s .y 

° 03 

I § 

b $P 

CP 
CP 
CP 


! 

. 5b 
'S p 
^ *p 

-p ^ 
CJ c 
QJ -P 


OS 'P 
QJ v 
P S 

£ 

03 

n ^ 

03 

cp P 
03 QJ 
4-J ,P 

X 3 

& § 

S § 

O W 
■2 % 
03 03 


CD 


jojsodiuj jo LjiJjq jo uo|68y 


c 5 

R O 
-R 

PI QJ 

| | 

Go S 

”3 o 

>H -R 

tS * 

oo "g 

c\j 3 

a g 
8 £ 
be ^3 

£ -5 


<§ _ 
| u 

P’S 

O ■£ 

r^, ’P 

o -Sp 


z 13 

a3 b* 

5 R 

o ^ 
"rs P 

I 

3 4S 
gc< 

§ 5 

O cP 
u (p 


2017/07/31 12:20:11 


FNMR(T) "False non-match rate' 
FMR(T) "False match rate" 


in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different. 






























Cross region FMR at threshold T = 0.103 for algorithm ntechlab_001, giving FMR(T) = 0.0001 globally. 
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FNMR(T) "False non-match rate' 
FMR(T) "False match rate" 


in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different. 
























Cross region FMR at threshold T = 0.614 for algorithm rankone_000, giving FMR(T) = 0.0001 globally. 
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FNMR(T) "False non-match rate' 
FMR(T) "False match rate" 


in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different. 












Cross region FMR at threshold T = 0.692 for algorithm rankone_001, giving FMR(T) = 0.0001 globally. 
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FNMR(T) "False non-match rate' 
FMR(T) "False match rate" 


in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different. 



















Cross region FMR at threshold T = 80.766 for algorithm samtechOOO, giving FMR(T) = 0.0001 globally. 


FRVT - FACE RECOGNITION VENDOR TEST - VERIFICATION 
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FNMR(T) "False non-match rate' 
FMR(T) "False match rate" 


in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different. 




































Cross region FMR at threshold T = 10.120 for algorithm tongyitrans_001, giving FMR(T) = 0.0001 globally. 
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FNMR(T) "False non-match rate' 
FMR(T) "False match rate" 


in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different. 










Cross region FMR at threshold T = 3.971 for algorithm tongyitrans_002, giving FMR(T) = 0.0001 globally. 
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FNMR(T) "False non-match rate' 
FMR(T) "False match rate" 


in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different. 












Cross region FMR at threshold T = 1.000 tor algorithm tupel_001, giving FMR(T) = 0.0001 globally. 
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FNMR(T) "False non-match rate' 
FMR(T) "False match rate" 


matrix is not quite symmetric because images in the enrollment and verification sets are different. 






























Cross region FMR at threshold T = 18.505 for algorithm vcog_001, giving FMR(T) = 0.0001 globally. 
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FNMR(T) "False non-match rate' 
FMR(T) "False match rate" 


matrix is not quite symmetric because images in the enrollment and verification sets are different. 
















Cross region FMR at threshold T = 0.428 for algorithm vcog_002, giving FMR(T) = 0.0001 globally. 
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FNMR(T) "False non-match rate’ 
FMR(T) "False match rate" 


matrix is not quite symmetric because images in the enrollment and verification sets are different. 















Cross region FMR at threshold T = 0.114 for algorithm vigilantsolutions_000, giving FMR(T) = 0.0001 globally. 
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FNMR(T) "False non-match rate' 
FMR(T) "False match rate" 


increase in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different. 


















Cross region FMR at threshold T = 3.320 for algorithm vigilantsolutions_001, giving FMR(T) = 0.0001 globally. 
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FNMR(T) "False non-match rate' 
FMR(T) "False match rate" 


increase in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different. 

















Cross region FMR at threshold T = 0.080 for algorithm visionlabs_001, giving FMR(T) = 0.0001 globally. 
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FNMR(T) "False non-match rate' 
FMR(T) "False match rate" 


in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different. 


















Cross region FMR at threshold T = 0.903 for algorithm vocord_001, giving FMR(T) = 0.0001 globally. 
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FNMR(T) "False non-match rate' 
FMR(T) "False match rate" 


in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different. 

















Cross region FMR at threshold T = 0.867 for algorithm vocord_002, giving FMR(T) = 0.0001 globally. 
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FNMR(T) "False non-match rate' 
FMR(T) "False match rate" 


in FMR. The matrix is not quite symmetric because images in the enrollment and verification sets are different. 
























Cross region FMR at threshold T = 10.098 for algorithm yitu_000, giving FMR(T) = 0.0001 globally. 
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FNMR(T) "False non-match rate' 
FMR(T) "False match rate" 


matrix is not quite symmetric because images in the enrollment and verification sets are different. 
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FNMR(T) "False non-match rate' 
FMR(T) "False match rate" 
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FNMR(T) "False non-match rate' 
FMR(T) "False match rate" 
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4.6.2 Effect of age on impostors 

Background: This section shows the effect of age on the impostor distribution. The ideal behaviour is that the age of the 

enrollee and the impostor would not affect impostor scores. This would support FMR stability over sub-populations. 

Goals: 

> To show the effect of relative ages of the impostor and enrollee on false match rates. 

> To determine whether some algorithms have better impostor distribution stability. 

Methods: 

> Define 14 age group bins, spanning 0 to over 100 years old. 

> Compute FMR over all impostor comparisons for which the subjects in the enrollee and impostor images have ages 
in two bins. 

> Compute FMR over all impostor comparisons for which the subjects are additionally of the same sex, and born in 
the same geographic region. 

Results: 

The notable aspects are: 

> Diagonal dominance: Impostors are more likely to be matched against their same age group. 

> Same sex and same region impostors are more successful. On the diagonal, an impostor is more likely to succeed 
by posing as someone of the same sex. If A log 10 FMR = 0.2, then same-sex same-region FMR exceeds the all-pairs 
FMR by factor of 10 0 2 = 1.6. 

> Young children impostors give elevated FMR against young children. Older adult impostor give elevated FMR 
against older adults. These effects are quite large, for example if A log 10 FMR = 1.0 larger than a 32 year old, then 
these groups have higher FMR by a factor of 10 1 = 10. This would imply an FMR above 0.01 for a nominal (global) 
FMR = 0.001. 

[> Algorithms vary. 

> We computed the same quantities for a global FMR = 0.0001. The effects are similar. 

Note the calculations in this section include impostors paired across all countries of birth. 
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